Improved Spelling Error Detection and Correction for Arabic

نویسندگان

  • Mohammed Attia
  • Pavel Pecina
  • Younes Samih
  • Khaled F. Shaalan
  • Josef van Genabith
چکیده

A spelling error detection and correction application is based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We semi-automatically develop a dictionary of 9.3 million fully inflected Arabic words using a morphological transducer and a large corpus. We improve the error model by analysing error types and creating an edit distance based re-ranker. We also improve the language model by analysing the level of noise in different sources of data and selecting the optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2010, OpenOffice Ayaspell and Google Docs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Arib$@$QALB-2015 Shared Task: A Hybrid Cascade Model for Arabic Spelling Error Detection and Correction

In this paper we present the Arib system for Arabic spelling error detection and correction as part of the second Shared Task on Automatic Arabic Error Correction. Our system contains many components that address various types of spelling error and applies a combination of approaches including rule based, statistical based, and lexicon based in a cascade fashion. We also employed two core model...

متن کامل

NAIST at the HOO 2012 Shared Task

This paper describes the Nara Institute of Science and Technology (NAIST) error correction system in the Helping Our Own (HOO) 2012 Shared Task. Our system targets preposition and determiner errors with spelling correction as a pre-processing step. The result shows that spelling correction improves the Detection, Correction, and Recognition Fscores for preposition errors. With regard to preposi...

متن کامل

Error Correction for Arabic Dictionary Lookup

We describe a new Arabic spelling correction system which is intended for use with electronic dictionary search by learners of Arabic. Unlike other spelling correction systems, this system does not depend on a corpus of attested student errors but on studentand teacher-generated ratings of confusable pairs of phonemes or letters. Separate error modules for keyboard mistypings, phonetic confusio...

متن کامل

Generalized Character-Level Spelling Error Correction

We present a generalized discriminative model for spelling error correction which targets character-level transformations. While operating at the character level, the model makes use of wordlevel and contextual information. In contrast to previous work, the proposed approach learns to correct a variety of error types without guidance of manuallyselected constraints or language-specific features...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012